A sequential test for variable selection in high dimensional complex data
نویسندگان
چکیده
Given a high dimensional p-vector of continuous predictors X and a univariate response Y , principal fitted components (PFC) provide a sufficient reduction of X that retains all regression information about Y in X while reducing the dimensionality. The reduction is a set of linear combinations of all the p predictors, where with the use of a flexible set of basis functions, predictors related to Y via complex, nonlinear relationship can be detected. In the presence of possibly large number of irrelevant predictors, the accuracy of the sufficient reduction is hindered. The proposed method adapts a sequential test to the PFC to obtain a ‘‘pruned’’ sufficient reduction that shedoff the irrelevant predictors. The sequential test is based on the likelihood ratio which expression is derived under different covariance structures of X |Y . The resulting reduction has an improved accuracy and also allows the identification of the relevant variables. © 2014 Elsevier B.V. All rights reserved.
منابع مشابه
A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملGeostatistical simulation of RQD variable using investigation of spatial continuity between Quaternary alluvial layer and hard rock of Gohar-Zamin mine to the determination of permeable zones
In this research, a sequential Gaussian simulation method has been used to determine the permeable zones in the hard-rock aquifer of the Gohr-Zamin open pit mine. For this purpose, 4946 RQD data from eighty-seven exploratory boreholes was used and exploratory-spatial data analysis of these data was performed using the preliminary statistics, location maps, histograms and variograms. Results of ...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملApplying Combined Approach of Sequential Floating Forward Selection and Support Vector Machine to Predict Financial Distress of Listed Companies in Tehran Stock Exchange Market
Objective: Nowadays, financial distress prediction is one of the most important research issues in the field of risk management that has always been interesting to banks, companies, corporations, managers and investors. The main objective of this study is to develop a high performance predictive model and to compare the results with other commonly used models in financial distress prediction M...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 81 شماره
صفحات -
تاریخ انتشار 2015